home *** CD-ROM | disk | FTP | other *** search
-
- WNILS Working Group Chris Weider
- INTERNET-DRAFT Merit Network, Inc.
- Jim Fullton
- CNIDR
- Simon Spero
- 3/26/93 UNC Chapel Hill
-
-
- Architecture of the Whois++ Index Service
-
- Status of this memo:
-
- The authors describe an architecture for indexing in distributed databases,
- and apply this to the WHOIS++ protocol.
-
-
- This document is an Internet Draft. Internet Drafts are working
- documents of the Internet Engineering Task Force (IETF), its Areas,
- and its Working Groups. Note that other groups may also distribute
- working documents as Internet Drafts.
-
- Internet Drafts are draft documents valid for a maximum of six
- months. Internet Drafts may be updated, replaced, or obsoleted
- by other documents at any time. It is not appropriate to use
- Internet Drafts as reference material or to cite them other than
- as a "working draft" or "work in progress."
-
- Please check the I-D abstract listing contained in each Internet
- Draft directory to learn the current status of this or any
- other Internet Draft.
-
- This Internet Draft expires October 1, 1993.
-
- 1. Purpose:
-
- The WHOIS++ directory service [Deutsch, et al, 1992] is intended to provide
- a simple, extensible directory service predicated on a template-based
- information model and a flexible query language. This document describes
- an architecture designed to link together many of these WHOIS++ servers
- into a distributed, searchable wide area directory service.
-
- 2. Scope:
-
- This document details a distributed, easily maintained architecture for
- providing a unified index to a large number of distributed WHOIS++
- servers. This architecture can be used with systems other than WHOIS++ to
- provide a distributed directory service which is also searchable.
-
- 3. Motivation and Introduction:
-
- It seems clear that with the vast amount of directory information potentially
- available on the Internet, it is simply unfeasible to build a centralized
- directory to serve all this information. Therefore, we should look at building
- a distributed directory service. If we are to distribute the directory service,
- the easiest (although not necessarily the best) way of building the directory
- service is to build a hierarchy of directory information collection agents.
- In this architecture, a directory query is delivered to a certain agent
- in the tree, and then handed up or down, as appropriate, so that the query
- is delivered to the agent which holds the information which fills the query.
- This approach has been tried before, most notably in some implementations of
- the X.500 standard. However, there are number of major flaws with the approach
- as it has been taken. This new Index Service is designed to fix these flaws.
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
-
- 3.1 The search problem
-
- One of the primary assumptions made by recent implementations of distributed
- directory services is that every entry resides in some location in a hierarch-
- ical name space. While this arrangement is ideal for reading the entry once
- one knows its location, it is not as good when one is searching for the location
- in the namespace of those entries which meet some set of criteria. If the only
- criteria we know about a desired entry are items which do not appear in the
- namespace, we are forced to do a global query. Whenever we issue a global
- query (at the root of the namespace), or a query at the top of a given subtree
- in the namespace, that query is replicated to _all_ subtrees of the starting
- point. The replication of the query to all subtrees is not necessarily a
- problem; queries are cheap. However, every server to which the query has been
- replicated must process that query, even if it has no entries which match
- the specified criteria. This part of the global query processing is quite
- expensive. A poorly designed namespace or a thin namespace can cause the
- vast majority of queries to be replicated globally, but a very broad
- namespace can cause its own navigation problems. Because of these problems,
- search has been turned off at high levels of the X.500 namespace.
-
- 3.2 The location problem
-
- With global search turned off, one must know in advance how the name space is
- laid out so that one can guide a query to a proper location. Also, the layout
- of the namespace then becomes critical to a user's ability to find the
- desired information. Thus there are endless battles about how to lay out the
- name space to best serve a given set of users, and enormous headaches whenever
- it becomes apparent that the current namespace is unsuited to the current
- usages and must be changed (as recently happened in X.500). Also, assuming
- one does impose multiple hierarchies on the entries through use of the
- namespace, the mechanisms to maintain these multiple hierarchies in X.500 do
- not exist yet, and it is possible to move entries out from under their
- pointers. Also, there is as yet no agreement on how the X.500 namespace
- should look even for the White Pages types of information that is currently
- installed in the X.500 pilot project.
-
- 3.3 The Yellow Pages problem
-
- Current implementations of this hierarchical architecture have also been
- unsuited to solving the Yellow Pages problem; that is, the problem of
- easily and flexibly building special-purpose directories (say of molecular
- biologists) and of automatically maintaining these directories once they have
- been built. In particular, the attributes appropriate to the new directory
- must be built into the namespace because that is the only way to segregate
- related entries into a place where they can be found without a global
- search. Also, there is a classification problem; how does one adequately
- specify the proper categories so that people other than the creator of the
- directory can find the correct subtree? Additionally, there is the problem
- of actually finding the data to put into the subtree; if one must traverse
- the hierarchy to find the data, we have to look globally for the proper
- entries.
-
- 3.4 Solutions
-
- We'll hold off for a moment on describing the actual architecture used in
- our solution to these problems and concentrate on a high level description of
- what solutions are provided by our conceptual approach. To begin with,
- although every entry in WHOIS++ does indeed have a unique identifier
- (resides in a specific location in the namespace) the navigational algorithms
- to reach a specific entry does not necessarily depend on the identifier the
- entry has been assigned. The Index Service gets around the namespace and
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
-
- hierarchy problems by creating a directory mesh on top of the entries.
- Each layer of the mesh has a set of 'forward knowledge' which indicates the
- contents of the various servers at the next lower layer of the mesh. Thus
- when a query is received by a server in a given layer of the mesh, it can
- prune the search tree and hand the query off to only those lower level servers
- which have indicated that they might be able to answer it. Thus search becomes
- feasible at all levels of the mesh. In the current version of this architecture,
- we have chosen a certain set of information to hand up the mesh as forward
- knowledge. This may or may not be exactly the set of information required to
- construct a truly searchable directory, but the protocol itself doesn't
- restrict the types of information which can be handed around.
-
- Another benefit provided by the mesh of index servers is that since the
- entry identification scheme has been decoupled from the navigation service,
- multiple hierarchies can be built and easily maintained on top of the
- existing data. Also, the user does not need to know in advance where in the
- mesh the entry is contained.
-
- Also, the Yellow Pages problem now becomes tractable, as the index servers
- can pick and choose between information proffered by a given server;
- because we have an architecture that allows for automatic polling of data,
- special purpose directories become easy to construct and to maintain.
-
-
- 4. Components of the Index Service:
-
- 4.1 WHOIS++ servers
-
- The whois++ service is described in [Deutsch, et al, 1992]. As that service
- specifies only the query language, the information model, and the server
- responses, whois++ services can be provided by a wide variety of databases
- and directory services. However, to participate in the Index Service, that
- underlying database must also be able to generate a 'centroid', or some other
- type of forward knowledge, for the data it serves.
-
- 4.2 Centroids as forward knowledge
-
- The centroid of a server is comprised of a list of the templates and
- attributes used by that server, and a word list for each attribute.
- The word list for a given attribute contains one occurrence of every
- word which appears at least once in that attribute in some record in that
- server's data, and nothing else.
-
- For example, if a whois++ server contains exactly three records, as follows:
-
- Record 1 Record 2
- Template: User Template: User
- First Name: John First Name: Joe
- Last Name: Smith Last Name: Smith
- Favourite Drink: Labatt Beer Favourite Drink: Molson Beer
-
- Record 3
- Template: Domain
- Domain Name: foo.edu
- Contact Name: Mike Foobar
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
-
- the centroid for this server would be
-
- Template: User
- First Name: Joe
- John
- Last Name: Smith
- Favourite Drink: Beer
- Labatt
- Molson
-
- Template: Domain
- Domain Name: foo.edu
- Contact Name: Mike
- Foobar
-
- It is this information which is handed up the tree to provide forward knowledge.
- As we mention above, this may not turn out to be the ideal solution for
- forward knowledge, and we suspect that there may be a number of different
- sets of forward knowledge used in the Index Service. However, the directory
- architecture is in a very real sense independent of what types of forward
- knowledge are handed around, and it is entirely possible to build a
- unified directory which uses many types of forward knowledge.
-
-
- 4.3 Index servers and Index server Architecture
-
- A whois++ index server collects and collates the centroids (or other forward
- knowledge) of either a number of whois++ servers or of a number of other index
- servers. An index server must be able to generate a centroid for the
- information it contains.
-
- 4.3.1 Queries to index servers
-
- An index server will take a query in standard whois++ format, search its
- collections of centroids, determine which servers hold records which may fill
- that query, and then either a) forward the query to the appropriate servers
- on behalf of the user, or b) notify the user's client of the next servers
- to contact to submit the query.
-
- 4.3.2 Index server distribution model and centroid propogation
-
- The diagram on the next page illustrates how a mesh of index servers is
- created for a set of whois++ servers.
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
-
- whois++ index index
- servers servers servers
- for for
- whois++ lower-level
- servers index servers
- _______
- | |
- | A |__
- |_______| \ _______
- \----------| |
- _______ | D |__ ______
- | | /----------|_______| \ | |
- | B |__/ \----------| |
- |_______| | F |
- /----------|______|
- /
- _______ _______ /
- | | | |-
- | C |--------------| E |
- |_______| |_______|-
- \
- \
- _______ \ ______
- | | \----------| |
- | G |--------------------------------------| H |
- |_______| |______|
-
-
- Figure 1: Sample layout of the Index Service mesh
- _______________________________________________________________________________
-
-
- In the portion of the index tree shown above, whois++ servers A and B hand their
- centroids up to index server D, whois++ server C hands its centroid up to
- index server E, and index servers D and E hand their centroids up to index
- server F. Servers E and G also hand their centroids up to H.
-
- The number of levels of index servers, and the number of index servers at each
- level, will depend on the number of whois++ servers deployed, and the response
- time of individual layers of the server tree. These numbers will have to
- be determined in the field.
-
- 4.3.4 Centroid propogation and changes to centroids
-
- Centroid propogation is initiated by an authenticated POLL command (sec. 5.2).
- The format of the POLL command allows the poller to request the centroid of
- any or all templates and attributes held by the polled server. After the
- polled server has authenticated the poller, it determines which of the
- requested centroids the poller is allowed to request, and then issues a
- CENTROID-CHANGES report (sec. 5.3) to transmit the data. When the poller
- receives the CENTROID-CHANGES report, it can authenticate the pollee to
- determine whether to add the centroid changes to its data. Additionally, if
- a given pollee knows what pollers hold centroids from the pollee, it can
- signal to those pollers the fact that its centroid has changed by issuing
- a DATA-CHANGED command. The poller can then determine if and when to
- issue a new POLL request to get the updated information. The DATA-CHANGED
- command is included in this protocol to allow 'interactive' updating of
- critical information.
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
-
- 4.3.5 Query handling and passing algorithms
-
- When an index server receives a query, it searches its collection of centroids,
- and determines which servers hold records which may fill that query. As
- whois++ becomes widely deployed, it is expected that some index servers
- may specialize in indexing certain whois++ templates or perhaps even
- certain fields within those templates. If an index server obtains a match
- with the query _for those template fields and attributes the server indexes_,
- it is to be considered a match for the purpose of forwarding the query.
- There are two methods of forwarding a query, called 'chaining' and 'referral'.
-
- 4.3.5.1 Query referral
-
- Query referral is the process of informing a client which servers to contact
- next to resolve a query. The syntax for notifying a client is outlined in
- section 5.5.
-
- 4.3.5.2 Query chaining
-
- Query chaining is done when the queried index server takes responsibility for
- resubmitting the query to the appropriate lower servers. The server
- will then forward the query using the syntax in section 5.4, but then takes
- no further responsibility for the query. A whois++ query can specify the
- 'trace' option, which causes each server which receives the query to
- send its IANA handle and an identification string to the client.
-
- 5. Syntax for operations of the Index Service:
-
- 5.1 Data changed syntax
-
- The data changed template look like this:
-
- DATA-CHANGED:
- Version-number: // version number of index service software, used to insure
- // compatibility
- Time-of-latest-centroid-change: // time stamp of latest centroid change, GMT
- Time-of-message-generation: // time when this message was generated, GMT
- Server-handle: // IANA unique identifier for this server
- Best-time-to-poll: // For heavily used servers, this will identify when
- // the server is likely to be lightly loaded
- // so that response to the poll will be speedy, GMT
- Authentication-type: // Type of authentication used by server, or NONE
- Authentication-data: // data for authentication
- END DATA-CHANGED // This line must be used to terminate the data changed
- // message
-
- 5.2 Polling syntax
-
- POLL:
- Version-number: // version number of poller's index software, used to
- // insure compatibility
- Start-time: // give me all the centroid changes starting at this time, GMT
- End-time: // ending at this time, GMT
- Template: // a standard whois++ template name, or the keyword ALL, for a
- // full update.
- Field: // used to limit centroid update information to specific fields,
- // is either a specific field name, a list of field names,
- // or the keyword ALL
- Server-handle: // IANA unique identifier for the polling server.
- // this handle may optionally be cached by the polled
- // server to announce future changes
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
-
- Authentication-type: // Type of authentication used by poller, or NONE
- Authentication-data: // Data for authentication
- END POLL // This line must by used to terminate the poll message
-
- 5.3 Centroid change report
-
- CENTROID-CHANGES:
- Version-number: // version number of pollee's index software, used to
- // insure compatibility
- Start-time: // change list starting time, GMT
- End-time: // change list ending time, GMT
- Server-handle: // IANA unique identifier of the responding server
- Authentication-type: // Type of authentication used by pollee, or NONE
- Authentication-data: // Data for authentication
- Compression-type: // Type of compression used on the data, or NONE
- Size-of-compressed-data: // size of compressed data if compression is used
- Operation: // One of 3 keywords: ADD, DELETE, FULL
- // ADD - add these entries to the centroid for this server
- // DELETE - delete these entries from the centroid of this
- // server
- // FULL - the full centroid as of end-time follows
- Multiple occurrences of the following block of fields:
- Template: // a standard whois++ template name
- Field: // a field name within that template
- Data: // the word list itself, one per line, cr/lf terminated
- end of multiply repeated block
- END CENTROID-CHANGES // This line must be used to terminate the centroid
- // change report
-
- 5.4 Forwarded query
-
- FORWARDED-QUERY:
- Version-number: // version number of forwarder's index software, used to
- // insure compatibility
- Forwarded-From: // IANA unique identifier of the server forwarding query
- Forwarded-time: // time this query forwarded, GMT (used for debugging)
- Trace-option: // YES if query has 'trace' option listed, NO if not.
- // used at message reception time to generate trace information
- Query-origination-address: // address of origin of query
- Body-of-Query: // The original query goes here
- Authentication-type: // Type of authentication used by queryer
- Authentication-data: // Data for authentication
- END FORWARDED-QUERY // This line must be used to terminate the body of the
- // query
-
- 5.5 Query referral
-
- SERVERS-TO-ASK:
- Version-number: // version number of index software, used to insure
- // compatibility
- Query-id: // some query identifier so the client knows which query to
- // issue to the following servers
- Body-of-Query: // the original query goes here
- Next-Servers: // A list of servers to ask next, either IP addresses or
- // hostnames, one per line, cr/lf terminated
- END SERVERS-TO-ASK
-
-
- WNILS Working Group Whois++ Index Service Weider, et al.
-
-
- 6 References
-
- Deutsch, et al. Architecture of the WHOIS++ service. August 1992.
- Available by anonymous FTP as
- ucdavis.edu://pub/archive/wnils/Architecture.Overview
-
- 7 Author's Addresses
-
- Chris Weider
- clw@merit.edu
- Industrial Technology Institute, Pod G
- 2901 Hubbard Rd,
- Ann Arbor, MI 48105
- O: (313) 747-2730
- F: (313) 747-3185
-
- Jim Fullton
- fullton@concert.net
- MCNC Center for Communications
- Post Office Box 12889
- 3021 Cornwallis Road
- Research Triangle Park
- North Carolina 27709-2889
- O: 919-248-1499
- F: 919-248-1405
-
- Simon Spero
- ses@sunsite.unc.edu
- 310 Wilson Library CB #3460
- University of North Carolina
- Chapel Hill, NC 27599-3460
- O: (919) 962-9107
- F: (919) 962-5604
-
-